258 research outputs found

    Risque et TAL : détection, prévention, gestion. Introduction au 1 er atelier

    Get PDF
    International audienceThis article is the introduction to the first workshop dedicated to Risk and NLP, addressing theuse of natural language processing methods for the detection, prevention and management of risk.The papers presented during the workshop come from both academic and industrial actors. Theycover the most risk-prone domain such as biomedicine (medicine and pharmacology), chemistryand transportation, but also address more transversal issues of human activity such as professionalenvironments and technical documentation and requirements. The works presented also show thevariety of the processed data (intervention reports, social network communications, academic papers,surveys, technical documentation), the objectives of the analyses (extraction of information relatedto the risk, ambiguity control, documentation checking), and of technical solutions (data collection,corpus analysis, resources development).Nous prĂ©sentons ici le premier atelier Risque et TAL portant sur les mĂ©thodes de traitement automa-tiques des langues pour la dĂ©tection, la prĂ©vention et la gestion des risques. Les travaux prĂ©sentĂ©s dans le cadre de cet atelier sont issus de travaux acadĂ©miques mais aussi d'applications dĂ©veloppĂ©es par des acteurs industriels. Ils couvrent les principaux domaines pour lesquels la notion de risque est au centre de prĂ©occupations de par l'ampleur des consĂ©quences Ă  Ă©viter : biomĂ©dical (mĂ©decine et pharmacologie), chimie et transports, mais abordent aussi des aspects plus transversaux de l'activitĂ© humaine, comme les environnements professionnels et les spĂ©cifications. Ces diffĂ©rents travaux montrent Ă  la fois la diversitĂ© des donnĂ©es visĂ©es (retours d'expĂ©rience, rĂ©seaux sociaux, publications scientifiques, enquĂȘtes, documentation technique), les objectifs des analyses (extraire de l'information liĂ©e aux risques, contrĂŽler ou vĂ©rifier les ambiguĂŻtĂ©s) et les solutions techniques (recueil de donnĂ©es, analyse de corpus, dĂ©veloppement de ressources)

    Repérage de relations sémantiques entre termes : sur la piste de la morphologie dérivationnelle

    Get PDF
    International audienceNotre travail est consacré au repérage de relations sémantiques entre termes. Dans ce contexte de constitution de terminologies structurées, nous nous intéressons en particulier à l'aide que peut apporter une approche basée sur la morphologie au regard d'autres techniques d'acquisition de relations sémantiques en corpus. Parmi les opérations dont dispose la morphologie, nous exploitons l'affixation et la composition. Nous portons également notre attention sur la supplétion des bases. Nous montrons quelques schémas interprétatifs qui se dégagent et indiquons les relations sémantiques qui sont aptes, alors, d'émerger

    Parallel sentence retrieval from comparable corpora for biomedical text simplification

    Get PDF
    International audienceParallel sentences provide semantically similar information which can vary on a given dimension , such as language or register. Parallel sentences with register variation (like expert and non-expert documents) can be exploited for the automatic text simplification. The aim of automatic text simplification is to better access and understand a given information. In the biomedical field, simplification may permit patients to understand medical and health texts. Yet, there is currently no such available resources. We propose to exploit comparable corpora which are distinguished by their registers (specialized and simplified versions) to detect and align parallel sentences. These corpora are in French and are related to the biomedical area. Manually created reference data show 0.76 inter-annotator agreement. Our purpose is to state whether a given pair of specialized and simplified sentences is parallel and can be aligned or not. We treat this task as binary classification (alignment/non-alignment). We perform experiments with a controlled ratio of imbalance and on the highly unbalanced real data. Our results show that the method we present here can be used to automatically generate a corpus of parallel sentences from our comparable corpus

    Detection and analysis of medical misbehavior in online forums

    Get PDF
    International audienceSocial media is an important source of information on behaviour and habits of users. It has been used as such in public health research to monitor adverse drug effects and drug misuse among others. We propose to study drug non-compliance in health online forums. First, we use supervised classification to detect non-compliance messages and obtain 0.436 of F-measure. Then, we manually analyse the content of the messages to learn what kinds of behaviour can be detected, and to study the effect the social media can have on patient's compliance behaviour

    Automatic detection of parallel sentences from comparable biomedical texts

    Get PDF
    International audienceParallel sentences provide semantically similar information which can vary on a given dimension, such as language or register. Parallel sentences with register variation (like expert and non-expert documents) can be exploited for the automatic text simplification. The aim of automatic text simplification is to better access and understand a given information. In the biomedical field, simplification may permit patients to understand medical and health texts. Yet, there is currently no such available resources. We propose to exploit comparable corpora which are distinguished by their registers (specialized and simplified versions) to detect and align parallel sentences. These corpora are in French and are related to the biomedical area. Our purpose is to state whether a given pair of specialized and simplified sentences is to be aligned or not. Manually created reference data show 0.76 inter-annotator agreement. We treat this task as binary classification (alignment/non-alignment). We perform experiments on balanced and imbalanced data. The results on balanced data reach up to 0.96 F-Measure. On imbalanced data, the results are lower but remain competitive when using classification models train on balanced data. Besides, among the three datasets exploited (se-mantic equivalence and inclusions), the detection of equivalence pairs is more efficient

    ...des conférences enfin disons des causeries... Détection automatique de segments en relation de paraphrase dans les reformulations de corpus oraux.

    Get PDF
    International audienceNotre travail porte sur la dĂ©tection automatique des segments en relation de reformulation paraphrastique dans les corpus oraux. L'approche proposĂ©e est une approche syntagmatique qui tient compte des marqueurs de reformu-lation paraphrastique et des spĂ©cificitĂ©s de l'oral. Les donnĂ©es de rĂ©fĂ©rence sont consensuelles. Une mĂ©thode automatique fondĂ©e sur l'apprentissage avec les CRF est proposĂ©e afin de dĂ©tecter les segments paraphrasĂ©s. DiffĂ©rents descripteurs sont exploitĂ©s dans une fenĂȘtre de taille variable. Les tests effectuĂ©s montrent que les segments en relation de paraphrase sont assez difficiles Ă  dĂ©tecter, surtout avec leurs frontiĂšres correctes. Les meilleures moyennes atteignent 0,65 de F-mesure, 0,75 de prĂ©cision et 0,63 de rappel. Nous avons plusieurs perspectives Ă  ce travail pour amĂ©liorer la dĂ©tection des segments en relation de paraphrase et pour Ă©tudier les donnĂ©es depuis d'autres points de vue. Abstract. Our work addresses automatic detection of segments with paraphrastic rephrasing relation in spoken corpus. The proposed approach is syntagmatic. It is based on paraphrastic rephrasing markers and the specificities of the spoken language. The reference data used are consensual. Automatic method based on machine learning using CRFs is proposed in order to detect the segments that are paraphrased. Different descriptors are exploited within a window with various sizes. The tests performed indicate that the segments that are in paraphrastic relation are quite difficult to detect. Our best average reaches up to 0.65 F-measure, 0.75 precision, and 0.63 recall. We have several perspectives to this work for improving the detection of segments that are in paraphrastic relation and for studying the data from other points of view

    Detection and analysis of drug non-compliance in internet fora using information retrieval approaches

    Get PDF
    International audienceIn the health-related field, drug non-compliance situations happen when patients do not follow their prescriptions and do actions which lead to potentially harmful situations. Although such situations are dangerous, patients usually do not report them to their physicians. Hence, it is necessary to study other sources of information. We propose to study online health fora with information retrieval methods in order to identify messages that contain drug non-compliance information. Exploitation of information retrieval methods permits to detect non-compliance messages with up to 0.529 F-measure, compared to 0.824 F-measure reached with supervized machine learning methods. For some fine-grained categories and on new data, it shows up to 0.70 Precision

    Simplification-induced transformations: typology and some characteristics

    Get PDF
    International audienceThe purpose of automatic text simplification is to transform technical or difficult to understand texts into a more friendly version. The semantics must be preserved during this transformation. Automatic text simplification can be done at different levels (lexical, syntactic, semantic, stylistic...) and relies on the corresponding knowledge and resources (lexicon, rules...). Our objective is to propose methods and material for the creation of transformation rules from a small set of parallel sentences differentiated by their technicity. We also propose a typology of transformations and quantify them. We work with French-language data related to the medical domain, although we assume that the method can be exploited on texts in any language and from any domain

    Speculation and negation detection in french biomedical corpora

    Get PDF
    International audienceIn this work, we propose to address the detection of negation and speculation, and of their scope, in French biomedical documents. It has been indeed observed that they play an important role and provide crucial clues for other NLP applications. Our methods are based on CRFs and BiLSTM. We reach up to 97.21 % and 91.30 % F-measure for the detection of negation and speculation cues, respectively , using CRFs. For the computing of scope, we reach up to 90.81 % and 86.73 % F-measure on negation and speculation , respectively, using BiLSTM-CRF fed with word embeddings

    Recherche d'information médicale pour le patient Impact de ressources terminologiques

    Get PDF
    National audienceABSTRACT. The right of patients to access their clinical health record is granted by the code of SantĂ© Publique. Yet, this content remain difficult to understand. We propose an experience, in which we use queries defined by patients in order to find relevant documents. We utilise the Indri search engine, based on statistical language modeling and semantic resources. We stress the point related to the terminological variation (e.g. synonyms, abbreviations) to make the link between expert and patient languages. Various combinations of resources and Indri settings are explored, mostly based on query expansion. Our system shows up to 0.7660 P@10 and up to 0.6793 [email protected]ÉSUMÉ. Le droit d'accĂšs au dossier clinique par les patients est inscrit dans le code de SantĂ© Publique. Cependant, ce contenu reste difficile Ă  comprendre. Nous proposons une expĂ©rience, oĂč les requĂȘtes des patients sont utilisĂ©es pour retrouver les documents pertinents. Nous util-isons le moteur de recherche Indri, basĂ© sur le modĂšle statistique de la langue, et des ressources sĂ©mantiques. L'accent est mis sur la variation terminologique (e.g. synonymes, abrĂ©viations) pour faire le lien entre la langue des experts et des patients. DiffĂ©rentes combinaisons de ressources et du paramĂ©trage de Indri sont testĂ©es, essentiellement Ă  travers l'expansion des requĂȘtes. Notre systĂšme montre jusqu'Ă  0,7660 de P@10 et 0,6793 de NDCG@10
    • 

    corecore